AITopics | exchangeable random partition

We introduce the microclustering Ewens--Pitman model for random partitions, obtained by scaling the strength parameter of the Ewens--Pitman model linearly with the sample size. The resulting random partition is shown to have the microclustering property, namely: the size of the largest cluster grows sub-linearly with the sample size, while the number of clusters grows linearly. By leveraging the interplay between the Ewens--Pitman random partition with the Pitman--Yor process, we develop efficient variational inference schemes for posterior computation in entity resolution. Our approach achieves a speed-up of three orders of magnitude over existing Bayesian methods for entity resolution, while maintaining competitive empirical performance.

information retrieval, machine learning, random partition, (21 more...)

arXiv.org Machine Learning

2507.18101

Country:

North America > United States (0.14)
Asia > Middle East > Jordan (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling

Mingyuan Zhou

Neural Information Processing SystemsFeb-9-2025, 00:59:33 GMT

The beta-negative binomial process (BNBP), an integer-valued stochastic process, is employed to partition a count vector into a latent random count matrix. As the marginal probability distribution of the BNBP that governs the exchangeable random partitions of grouped data has not yet been developed, current inference for the BNBP has to truncate the number of atoms of the beta process. This paper introduces an exchangeable partition probability function to explicitly describe how the BNBP clusters the data points of each group into a random number of exchangeable partitions, which are shared across all the groups. A fully collapsed Gibbs sampler is developed for the BNBP, leading to a novel nonparametric Bayesian topic model that is distinct from existing ones, with simple implementation, fast convergence, good mixing, and state-of-the-art predictive performance.

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.14)
Asia > Middle East > Jordan (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
Europe > Spain > Galicia > Madrid (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.95)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.39)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.35)

Add feedback

Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling

Neural Information Processing SystemsMar-13-2024, 07:59:59 GMT

The beta-negative binomial process (BNBP), an integer-valued stochastic process, is employed to partition a count vector into a latent random count matrix. As the marginal probability distribution of the BNBP that governs the exchangeable random partitions of grouped data has not yet been developed, current inference for the BNBP has to truncate the number of atoms of the beta process. This paper introduces an exchangeable partition probability function to explicitly describe how the BNBP clusters the data points of each group into a random number of exchangeable partitions, which are shared across all the groups. A fully collapsed Gibbs sampler is developed for the BNBP, leading to a novel nonparametric Bayesian topic model that is distinct from existing ones, with simple implementation, fast convergence, good mixing, and state-of-the-art predictive performance.

bnbp topic model, exchangeable random partition, partition, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.14)
Asia > Middle East > Jordan (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
Europe > Spain > Galicia > Madrid (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.95)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.39)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.35)

Add feedback

Inductive Inference in Supervised Classification

Amiryousefi, Ali

arXiv.org Machine LearningMar-18-2021

Inductive inference in supervised classification context constitutes to methods and approaches to assign some objects or items into different predefined classes using a formal rule that is derived from training data and possibly some additional auxiliary information. The optimality of such an assignment varies under different conditions due to intrinsic attributes of the objects being considered for such a task. One of these cases is when all the objects' features are discrete variables with a priori known categories. As another example, one can consider a modification of this case with a priori unknown categories. These two cases are the main focus of this thesis and based on Bayesian inductive theories, de Finetti type exchangeability is a suitable assumption that facilitates the derivation of classifiers in the former scenario. On the contrary, this type of exchangeability is not applicable in the latter case, instead, it is possible to utilise the partition exchangeability due to John Kingman. These two types of exchangeabilities are discussed and furthermore here I investigate inductive supervised classifiers based on both types of exchangeabilities. I further demonstrate that the classifiers based on de Finetti type exchangeability can optimally handle test items independently of each other in the presence of infinite amounts of training data while on the other hand, classifiers based on partition exchangeability still continue to benefit from joint labelling of all the test items. Additionally, it is shown that the inductive learning process for the simultaneous classifier saturates when the amount of test data tends to infinity.

classifier, exchangeability, probability, (17 more...)

arXiv.org Machine Learning

2103.10549

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)
(2 more...)

Genre:

Instructional Material (0.45)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Non-exchangeable random partition models for microclustering

Di Benedetto, Giuseppe, Caron, François, Teh, Yee Whye

arXiv.org Machine LearningNov-20-2017

Many popular random partition models, such as the Chinese restaurant process and its two-parameter extension, fall in the class of exchangeable random partitions, and have found wide applicability in model-based clustering, population genetics, ecology or network analysis. While the exchangeability assumption is sensible in many cases, it has some strong implications. In particular, Kingman's representation theorem implies that the size of the clusters necessarily grows linearly with the sample size; this feature may be undesirable for some applications, as recently pointed out by Miller et al. (2015). We present here a flexible class of non-exchangeable random partition models which are able to generate partitions whose cluster sizes grow sublinearly with the sample size, and where the growth rate is controlled by one parameter. Along with this result, we provide the asymptotic behaviour of the number of clusters of a given size, and show that the model can exhibit a power-law behavior, controlled by another parameter. The construction is based on completely random measures and a Poisson embedding of the random partition, and inference is performed using a Sequential Monte Carlo algorithm. Additionally, we show how the model can also be directly used to generate sparse multigraphs with power-law degree distributions and degree sequences with sublinear growth. Finally, experiments on real datasets emphasize the usefulness of the approach compared to a two-parameter Chinese restaurant process.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

1711.07287

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Virginia > Alexandria County > Alexandria (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.51)

Industry: Consumer Products & Services (0.55)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.55)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling

Zhou, Mingyuan

Neural Information Processing SystemsDec-31-2014

The beta-negative binomial process (BNBP), an integer-valued stochastic process, is employed to partition a count vector into a latent random count matrix. As the marginal probability distribution of the BNBP that governs the exchangeable random partitions of grouped data has not yet been developed, current inference for the BNBP has to truncate the number of atoms of the beta process. This paper introduces an exchangeable partition probability function to explicitly describe how the BNBP clusters the data points of each group into a random number of exchangeable partitions, which are shared across all the groups. A fully collapsed Gibbs sampler is developed for the BNBP, leading to a novel nonparametric Bayesian topic model that is distinct from existing ones, with simple implementation, fast convergence, good mixing, and state-of-the-art predictive performance.

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Texas (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.95)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.39)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.35)

Add feedback

Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling

Zhou, Mingyuan

arXiv.org Machine LearningDec-31-2014

The beta-negative binomial process (BNBP), an integer-valued stochastic process, is employed to partition a count vector into a latent random count matrix. As the marginal probability distribution of the BNBP that governs the exchangeable random partitions of grouped data has not yet been developed, current inference for the BNBP has to truncate the number of atoms of the beta process. This paper introduces an exchangeable partition probability function to explicitly describe how the BNBP clusters the data points of each group into a random number of exchangeable partitions, which are shared across all the groups. A fully collapsed Gibbs sampler is developed for the BNBP, leading to a novel nonparametric Bayesian topic model that is distinct from existing ones, with simple implementation, fast convergence, good mixing, and state-of-the-art predictive performance.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

1410.7812

Country: North America > United States > Texas (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.95)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.38)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.35)

Add feedback

Generalized Negative Binomial Processes and the Representation of Cluster Structures

Zhou, Mingyuan

arXiv.org Machine LearningOct-7-2013

The paper introduces the concept of a cluster structure to define a joint distribution of the sample size and its exchangeable random partitions. The cluster structure allows the probability distribution of the random partitions of a subset of the sample to be dependent on the sample size, a feature not presented in a partition structure. A generalized negative binomial process count-mixture model is proposed to generate a cluster structure, where in the prior the number of clusters is finite and Poisson distributed and the cluster sizes follow a truncated negative binomial distribution. The number and sizes of clusters can be controlled to exhibit distinct asymptotic behaviors. Unique model properties are illustrated with example clustering results using a generalized Polya urn sampling scheme. The paper provides new methods to generate exchangeable random partitions and to control both the cluster-number and cluster-size distributions.

artificial intelligence, machine learning, reparameterized gnbp, (17 more...)

arXiv.org Machine Learning

1310.18

Country:

North America > United States > Texas > Travis County > Austin (0.14)
Asia > Middle East > Jordan (0.04)
South America > Uruguay > Maldonado > Maldonado (0.04)
(2 more...)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.36)

Add feedback

Filters

Collaborating Authors

exchangeable random partition

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Large-scale entity resolution via microclustering Ewens--Pitman random partitions

Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling

Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling

Inductive Inference in Supervised Classification

Non-exchangeable random partition models for microclustering

Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling

Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling

Generalized Negative Binomial Processes and the Representation of Cluster Structures